Amendments to the Claims 



This listing of claims will replace all prior versions, and listings, of the claims: 

1. (original) A method of downloading data sets by a plurality of web crawlers from 
among a plurality of host computers, comprising the steps of: 

assigning a web crawler identifier to each one of the plurality of web crawlers; 
for each respective web crawler: 

downloading at least one data set that includes addresses of one of more 
referred data sets; 

identifying the addresses of the one or more referred data sets, wherein each 
identified address includes a host computer identifier; 
for each identified address: 

generating a representation of the host computer identifier; 

determining a web crawler identifier to which the representation 

corresponds; and 

when the determined web crawler identifier is not assigned to the 
respective web crawler, sending the identified address to the web crawler to which the 
determined web crawler identifier is assigned. 

2. (original) The method of claim 1, wherein 

the plurality of web crawlers consists of n web crawlers; and 

generating the representation includes computing a hash function of the host 
computer identifier to generate an integer value that is a member of a set of n predefined 
distinct values. 

3. (original) The method of claim 1, wherein 
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the plurality of web crawlers consists of n web crawlers; and 

generating the representation includes computing a hash function of the host 
computer identifier to generate an intermediate value V, and computing V modulo n. 

4. (original) The method of claim 1, wherein the sending step includes: 

determining a web crawler address for the web crawler to which the determined 
web crawler identifier is assigned; 

transmitting the identified data set address to the destination web crawler at the 

determined web crawler address. 

5. (canceled) 

6. (original) A web crawler system for downloading data set addresses from among a 
plurality of host computers, comprising: 

a plurality of web crawlers, wherein each web crawler has been assigned a web 

crawler identifier; 

for each respective web crawler: 

a main web crawler module for downloading and processing data sets stored 

on a plurality of host computers, the main web crawler module identifying addresses of 
the one or more referred data sets in the downloaded data sets, wherein each identified 
address includes a host computer identifier; and 

an address distribution module for processing the identified addresses, the 
address distribution module including instructions for: 

generating a representation of the host computer identifier, wherein the 
representation corresponds to one of the web crawler identifiers; 
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determining a web crawler identifier to which the representation 

corresponds; and 

when the determined web crawler identifier is not assigned to the 
respective web crawler, sending the identified address to a destination web crawler 
comprising the web crawler to which the determined web crawler identifier is assigned. 

7. (original) The web crawler system of claim 6 wherein 

the plurality of web crawlers consists of n web crawlers; and 

the address distribution module's instructions for generating the representation 

includes instructions for computing a hash function of the host computer identifier to 

generate an intermediate value V, and computing V modulo n. 

8. (original) The web crawler system of claim 6, further comprising: 

for each respective web crawler, a web crawler interface for transmitting the 
identified address to the destination web crawler and for receiving identified addresses 
from each of the plurality of web crawlers other than the respective web crawler. 

9. (original) The web crawler system of claim 6, further comprising: 

for each respective web crawler, a lookup table storing for each of the plurality of 
web crawler identifiers a corresponding web crawler address, said lookup table for use by 
the address distribution module in determining a web crawler address to which to send 
the identified data set address. 

10. (original) A computer program product for use in conjunction with a web crawler 
system wherein each web crawler is assigned a web crawler identifier, the computer 
program product comprising a computer readable storage medium and a computer 
program mechanism embedded therein, the computer program mechanism comprising: 

a main web crawler module for downloading and processing data sets stored on a 
plurality of host computers, the main web crawler module identifying addresses of the 
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one or more referred data sets in the downloaded data sets, wherein each identified 
address includes a host computer identifier; and 

an address distribution module for processing the identified addresses, the address 
distribution module including instructions for: 

generating a representation of the host computer identifier, wherein the 
representation corresponds to one of the web crawler identifiers; 

determining a web crawler identifier to which the representation corresponds; 

and 

when the determined web crawler identifier is not assigned to the respective 
web crawler, sending the identified address to a destination web crawler comprising the 
web crawler to which the determined web crawler identifier is assigned. 

11. (original) The computer program product of claim 10, wherein: 

the web crawler system consists of n web crawlers; and 

the address distribution module's instructions for generating the representation 
includes instructions for computing a function of the host computer identifier to generate 
an integer value that is a member of a set of n predefined distinct values. 

12. (original) The computer program product of claim 10, wherein: 

the web crawler system consists of n web crawlers; and 

the address distribution module's instructions for generating the representation 
includes instructions for computing a hash function of the host computer identifier to 
generate an intermediate value V, and computing V modulo n. 

13. (original) The computer program product of claim 10, further comprising: 



5 



a web crawler interface for transmitting the identified address to the destination web 
crawler and for receiving identified addresses from each of the plurality of web crawlers 
other than the respective web crawler. 

14. (original) The computer program product of claim 10, further comprising: 

a lookup table storing for each of the plurality of web crawler identifiers a 
corresponding web crawler address, said lookup table for use by the address distribution 
module in determining a web crawler address to which to send the identified data set 
address. 

15. (new) The method of claim 1, wherein each respective web crawler includes multiple 
threads to download and process documents from a plurality of host computers. 

16. (new) The web crawler system of claim 6 wherein each of the plurality of web 
crawlers includes multiple threads to download and process documents from a plurality 
of host computers. 

17. (new) The computer program product of claim 10 wherein each web crawler includes 
multiple threads. 

18. (new) The computer program product of claim 17 wherein each thread executes a 
main web crawler module. 
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